Automatically Tracking Metadata and Provenance of Machine Learning Experiments
نویسندگان
چکیده
We present a lightweight system to extract, store and manage metadata and provenance information of common artifacts in machine learning (ML) experiments: datasets, models, predictions, evaluations and training runs. Our system accelerates users in their ML workflow, and provides a basis for comparability and repeatability of ML experiments. We achieve this by tracking the lineage of produced artifacts and automatically extracting metadata such as hyperparameters of models, schemas of datasets or layouts of deep neural networks. Our system provides a general declarative representation of said ML artifacts, is integrated with popular frameworks such as MXNet, SparkML and scikit-learn, and meets the demands of various production use cases at Amazon.
منابع مشابه
Augmenting geospatial data provenance through metadata tracking in geospatial service chaining
In a service-oriented environment, heterogeneous data from distributed data archiving centers and various geo-processing services are chained together dynamically to generate on-demand data products. Creating an executable service chain requires detailed specification of metadata for data sets and service instances. Using metadata tracking, semantics-enabled metadata are generated and propagate...
متن کاملDeclarative Model Discovery in Provenance Data for Aiding in Scientific Experiment Planning
Data provenance manages a collection of metadata cataloging origin and history of data. In scientific workflows, this metadata supports scientific experiment planning. However, the amount of provenance data generated from scientific workflow executions can grow through time, becoming infeasible evaluate them manually. Thus, mechanisms for automatically extracting and presenting knowledge from p...
متن کاملUsing Provenance to Extract Semantic File Attributes
Rich, semantically descriptive file attributes are valuable in many contexts, such as semantic namespaces and desktop search. Descriptive attributes help users to find files placed in seemingly-arbitrary locations by different applications. However, extracting semantic attributes from file contents is nontrivial. An alternative is to examine file provenance: how and when files are used, and the...
متن کاملUsing Provenance for Personalized Quality Ranking of Scientific Datasets
The rapid growth of eScience has led to an explosion in the creation and availability of scientific datasets that includes raw instrument data and derived datasets from model simulations. A large number of these datasets are surfacing online in public and private catalogs, often annotated with XML metadata, as part of community efforts to foster open research. With this rapid expansion comes th...
متن کاملA Foundational Ontology to Support Scientific Experiments
Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are produced the importance of tracking and sharing its metadata grows. Therefore, it is desirable to make it easy to access, share, reuse, integrate and reason. To add...
متن کامل